Understanding compound words: a new perspective from compositional systems in distributional semantics

نویسنده

  • Marco Marelli
چکیده

In the present work I discuss CAOSS (Compounding as Abstract Processes in Semantic Space), a model that aims at capturing the semantic dynamics of compound processing in a data-driven framework. In CAOSS, word meanings are represented as vectors encoding lexical cooccurrences in a reference corpus (e.g., the meaning of snow will be based on how often snow appears with other words), according to the tenets of distributional semantics (e.g., Landauer & Dumais, 1997). A combinatorial procedure is induced following Guevara (2010): given two vectors (constituent words) u and v, their composed representation (the compound) can be computed as c = M ∗ u+H ∗ v, where M and H are weight matrices estimated from corpus examples. The matrices are trained using least squares regression, having the vectors of the constituents as independent words (car and wash, rail and way) as inputs and the vectors of example compounds (carwash, railway) as outputs, so that the similarity between M ∗u+H∗v and c is maximized. In other words, the matrices are defined in order to recreate the compound examples as accurately as possible. Once the two weight matrices are estimated, they can be applied to any word pair in order to obtain a meaning representation for their combination. CAOSS is shown to correctly predict effects related to the processing of novel compounds, and in particular the impact of relational information. Moreover, model predictions are useful for the comprehension of the role of semantic transparency in the processing of familiar compounds. Taken together, the model simulations indicate that a compositional perspective on compound-word meaning is crucial for understading the processing of both novel and familiar combinations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Coordination in Categorical Compositional Distributional Semantics

An open problem with categorical compositional distributional semantics is the representation of words that are considered semantically vacuous from a distributional perspective, such as determiners, prepositions, relative pronouns or coordinators. This paper deals with the topic of coordination between identical syntactic types, which accounts for the majority of coordination cases in language...

متن کامل

Compositional-ly Derived Representations of Morphologically Complex Words in Distributional Semantics

Speakers of a language can construct an unlimited number of new words through morphological derivation. This is a major cause of data sparseness for corpus-based approaches to lexical semantics, such as distributional semantic models of word meaning. We adapt compositional methods originally developed for phrases to the task of deriving the distributional meaning of morphologically complex word...

متن کامل

Detecting Learner Errors in the Choice of Content Words Using Compositional Distributional Semantics

We describe a novel approach to error detection in adjective–noun combinations. We present and release a new dataset of annotated errors where the examples are extracted from learner texts and annotated with error types. We show how compositional distributional semantic approaches can be applied to discriminate between correct and incorrect word combinations from learner data. Finally, we show ...

متن کامل

Distributional semantic phrases vs. semantic distributional nonsense: Adjective modification in compositional distributional semantics

In this talk, I discuss the ability of compositional distributional semantics to model adjective modification. I present three studies that explore the degree to which semantic intuitions are grounded in the distributional representations of adjective-noun phrases, as well as provide insight into various linguistic phenomena by extracting unsupervised cues from these distributional representati...

متن کامل

A relatedness benchmark to test the role of determiners in compositional distributional semantics

Distributional models of semantics capture word meaning very effectively, and they have been recently extended to account for compositionally-obtained representations of phrases made of content words. We explore whether compositional distributional semantic models can also handle a construction in which grammatical terms play a crucial role, namely determiner phrases (DPs). We introduce a new p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016